good view
What Makes for Good Views for Contrastive Learning?
Contrastive learning between multiple views of the data has recently achieved state of the art performance in the field of self-supervised representation learning. Despite its success, the influence of different view choices has been less studied. In this paper, we use theoretical and empirical analysis to better understand the importance of view selection, and argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact. To verify this hypothesis, we devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI. We also consider data augmentation as a way to reduce MI, and show that increasing data augmentation indeed leads to decreasing MI and improves downstream classification accuracy. As a by-product, we achieve a new state-of-the-art accuracy on unsupervised pre-training for ImageNet classification (73% top-1 linear readout with a ResNet-50).
Review for NeurIPS paper: What Makes for Good Views for Contrastive Learning?
Weaknesses: • In line 121, the optimal z* is derived based on the fact that the downstream task is known (Line 145-146). However, for a pretrained model that can served for multiple downstream tasks, the optimal z* for one of the tasks might not be closed to optimal for another task. In other word, the z* selected from the proposed method might have worse generalizability than the z learning by standard contrastive learning. A clear example of this is the result from Table 1. To demonstrate the fact that optimal views depend on the downstream task, the author constructs a toy experiments as shown in Table 1. This experiment exposes the fact that when only selecting specific views for specific downstream task, the learned representation cannot generalize well to other tasks.
Review for NeurIPS paper: What Makes for Good Views for Contrastive Learning?
The paper studies contrastive methods for self-supervised representation learning. It studies how multiple views of the same data are used for representation learning, and how the mutual information between these views matters for downstream performance. The authors propose a theory that there is a sweet spot in the amount of mutual information between two views (not too less, not too much) such that the downstream performance is highest at this point. They empirically verify this theory for two classes of views (patches, and colors). They propose a method that simply combines existing augmentations from prior work and provides gains over them.
What Makes for Good Views for Contrastive Learning?
Contrastive learning between multiple views of the data has recently achieved state of the art performance in the field of self-supervised representation learning. Despite its success, the influence of different view choices has been less studied. In this paper, we use theoretical and empirical analysis to better understand the importance of view selection, and argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact. To verify this hypothesis, we devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI. We also consider data augmentation as a way to reduce MI, and show that increasing data augmentation indeed leads to decreasing MI and improves downstream classification accuracy.
Parametric Augmentation for Time Series Contrastive Learning
Zheng, Xu, Wang, Tianchun, Cheng, Wei, Ma, Aitian, Chen, Haifeng, Sha, Mo, Luo, Dongsheng
Modern techniques like contrastive learning have been effectively used in many areas, including computer vision, natural language processing, and graph-structured data. Creating positive examples that assist the model in learning robust and discriminative representations is a crucial stage in contrastive learning approaches. Usually, preset human intuition directs the selection of relevant data augmentations. Due to patterns that are easily recognized by humans, this rule of thumb works well in the vision and language domains. However, it is impractical to visually inspect the temporal structures in time series. The diversity of time series augmentations at both the dataset and instance levels makes it difficult to choose meaningful augmentations on the fly. In this study, we address this gap by analyzing time series data augmentation using information theory and summarizing the most commonly adopted augmentations in a unified format. We then propose a contrastive learning framework with parametric augmentation, AutoTCL, which can be adaptively employed to support time series representation learning. The proposed approach is encoder-agnostic, allowing it to be seamlessly integrated with different backbone encoders. Experiments on univariate forecasting tasks demonstrate the highly competitive results of our method, with an average 6.5\% reduction in MSE and 4.7\% in MAE over the leading baselines. In classification tasks, AutoTCL achieves a $1.2\%$ increase in average accuracy.
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (2 more...)